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We study biological evolution in a high-dimensional genotype space in the regime of rare mutations 
and strong selection. The population performs an uphill walk which terminates at local fitness 
maxima. Assigning fitness randomly to genotypes, we show that the mean walk length is logarithmic 
in the number of initially available beneficial mutations, with a prefactor determined by the tail of the 
fitness distribution. This result is derived analytically in a simplified setting where the mutational 
neighborhood is fixed during the adaptive process, and confirmed by numerical simulations. 
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The adaptation of a population to a novel environment 
is a fundamental process of evolutionary biology which 
continues to attract considerable attention from theoret- 
ical [l[ as well as experimental Q perspectives. Adap- 
tation is driven by the occurrence of mutations that are 
beneficial in the new environment and therefore spread in 
the population, leading to an increase of fitness over time. 
This process displays a variety of dynamical patterns Q 
that depend on the supply of beneficial mutations (gov- 
erned by the product of population size M and mutation 
rate U) as well as on the structure of the fitness land- 
scape, which encodes how the genetic configuration of an 
organism (its genotype) affects the number of offspring it 
will leave in the next generation. 

A particularly simple, yet biologically relevant limit of 
adpative dynamics is the regime of strong selection and 
weak mutation (SSWM), where mutations are sufficiently 
rare to be treated as independent events, MU <C 1, 
and selection is strong enough for deleterious mutations 
(which decrease fitness) to be unable to spread In 
the SSWM regime the population is genetically homo- 
geneous most of the time, and its dynamics can be de- 
scribed by a point in the space of genotypes which per- 
forms an adaptive walk towards higher fitness. Because of 
the low mutation rate such a walk is constrained to move 
by single mutational steps, and it terminates when a local 
fitness maximum is reached, where no nearest neighbor 
genotypes are available that would confer higher fitness. 
Despite its strongly simplified nature, the adaptive walk 
model is in principle amenable to quantitative tests in 
microbial evolution experiments [7H10l|. 

In the present Letter we study the length of such adap- 
tive walks in a simple model of a rugged fitness landscape, 
where fitness values Pj of genotypes i are assumed to 
be independent random variables drawn from a common 
probability density p(F). The genotype space is a gener- 
alized hypercube formed by sequences of L letters drawn 
from an alphabet of size a, such that each genotype has 
N = (a — 1)L single mutant neighbors [111]. The walk 
is then specified by the transition probability Py from 
genotype i to a neighboring genotype j of higher fitness, 
Fj > Fi. In the SSWM regime Py is proportional to the 
fixation probability of the corresponding beneficial muta- 
tion, i.e. the probability that it will become dominant 



rather than going extinct due to demographic fluctua- 
tions IH, IH • When the fitness difference APy = Fj — Fi 
between the initial and final genotype is small in absolute 
terms, |APy| <C 1, while still maintaining the strong se- 
lection condition M|APy| 3> 1, , the fixation probability 
is proportional to APy, 
expression 

F, 



and normalization leads to the 



AP, : , 



(1) 



After the transition the population has fitness Fj and 
encounters a new set of random fitness values (apart from 
the fitness Fi of the preceding genotype, which is however 
inaccessible because Pj < Fj). 

Assuming that n fitter neighboring genotypes are avail- 
able at the starting point of the adaptive walk, we ask 
for the mean number of steps £(n, N) that are required 
to reach a local fitness maximum. Since most mutations 
available to a viable genotype are expected to be delete- 
rious or neutral LjJ, we are mainly interested in the be- 
havior of I when N n ;§> 1 . Simplified variants of this 
problem have been considered in previous work. In the 
random adaptive walk the dependence of the transition 
probability on fitness differences is ignored, and all avail- 
able fitter neighbors are chosen with equal probability, 

which leads tO Random ~ lnn-|-C ran dom with Crandom « 1.1 

fl6| . On the other hand, for greedy walks which 
always move to the neighboring genotype of highest fit- 
ness, the walk length remains finite for N, n — > oo and 
attains a limiting value of • 



17] 



^greedy ^ 1 ~ 1.T1 

For the full problem defined by the fitness-dependent 
transition probability ([!} we show below that the asymp- 
totic behavior of the mean walk length is generally log- 
arithmic, with a coefficient that depends on the form of 
the tail of the fitness distribution p(F). According to 
extreme value theory (EVT), the tail can be represented 
by the generalized Pareto form [l8l - [2H 



p(F) = (l + KFy 



(2) 



where the shape parameter n serves to distin guis h be- 
tween the different universality classes of EVT [22| . For 
k > the density @ is defined for all F > and decays 
as a power law, representing the Frechet class of EVT, 
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whereas for k < its support is restricted to the inter- 



val [0, 



and the distribution belongs to the Weibull 



class. The Gumbel class, comprising distributions of un- 
bounded support that decay faster than a power law, is 
recovered in the limit n — > 0. In previous work (20j it 
has been shown that the adaptive walk with fitness dis- 
tribution ([2]) reduces to the random (greedy) limit for 
k — » — oo (ft — > oo). For k — > — oo the density ([2} de- 
velops a (5-function singularity at the upper boundary 
of its support, which implies that all available mutants 
have the same fitness and ([1]) reduces to a random choice. 
On the other hand, for k — > oo the density ([2]) becomes 
extremely broad, such that the fitness of the most fit mu- 
tant in a neighborhood is typically much larger than all 
other fitness values and (P) reduces to the greedy rule. 

In terms of the parametrization @, our main result 
for the mean walk length reads 



£ re j3 In n with j3 
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for ft < 1. 



(3) 



This expression recovers the random limit (/3 = 1) for 
k — > — oo, and shows that the greedy limit (/3 = 0) is 
attained at k = 1, where the density ([2]) ceases to have 
a finite first moment. The result (3 — 1/2 for the Gum- 
bel class was previously obtained numerically by Orr |6[ 
(see below), and analytically by Jain and Seetharaman 
J23| using an approach along the lines of [16J. Surpris- 
ingly, the expression © also appears in the context of a 
completely different evolution model of quasispecies t ype , 
which applies in the limit of infinite populations [2J-|26j . 
The reason for this coincidence will be discussed at the 
end of the paper. 
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FIG. 1. (Color online) Illustration of the two processes in- 
volved in a step of the adaptive walk. Starting from a geno- 
type of fitness rank i in its current mutational neighborhood 
(upper fitness axis), the population moves to rank j < i with 
probability Pij . In the new neighborhood (lower fitness axis) 
the rank of the current genotype is j'. In the Gillespie aprox- 
imation the old and the new neighborhoods are the same. 

The Gillespie approximation. Our analysis is based on 
an approximation first introduced by Gillespie The 
key idea is to ignore the change in available fitness values 
that occurs after a jump of the adaptive walk, which im- 
plies that the entire adaptive process proceeds in a single, 



fixed neighborhood (Fig. [I). The expected length of the 
walk is then equal to the first passage time (or absorp- 
tion time) of the Markov chain defined by the transition 
probability ([1]) for a fixed set of fitness values F k . For the 
following discussion it will be convenient to label the fit- 
ness values by their rank, such that Fi > F 2 > ... > Fjy. 
The mean absorption time to the final state of maximal 
fitness F\ 1 starting from fitness rank n, is then given by 

a 
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n — 1 n — 1 
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— ' X n (n — 1) ^— ' ^— ' A 7 - 7 ( 7 — 1 ) 

=1 ny ' i=l j=i+l 3Jyj ' 



(4) 



where Hf. = Yli=i 1 ^ s the fcth harmonic number, and 



A j; = k (Fk - F k+1 ) = kA h 



(5) 



k=l 



k=l 



with Ai — and fitness gaps = Fk — i*fc+i- Be- 
cause fitness only increases during the process, the ab- 
sorption time is obviously independent of the fitness val- 
ues F n+ \, F n+ 2, ...,-Fjv above the starting rank. 

Within the Gillespie approximation, the adaptive walk 
length I is obtained by averaging the absorption time (j4]) 
with respect to the fitness distribution p(F). Gillespie 
observed that the problem simplifies significantly if p(F) 
is assumed to fall into the Gumbel universality class of 
EVT. Taking the limit N — > oo at fixed n, the n supe- 
rior fitness values lie in the tail of the distribution, and 
it is known that the scaled fitness ranks /cA^ converge to 
independent, identically distributed exponential random 
variables [22j. It then follows by symmetry that the av- 
erage ratios in (j4]) are (j^r) = |Etj ano - evaluation of the 
sum yields the simple result {t n ) = ^(H n -i + 1) re 

|lnn+ ^(7 + 1), where 7 re 0.577215... denotes Euler's 
constant. Simulations of the full problem show that the 
mean walk length differs from this approximate result 
only by an offset in the constant correction term, which 
is given by Co re 5(7 + 1) + 0.44 [6]. A similar calcula- 
tion for the model with random choice of fitter neighbors 
ields a mean absorption time of (t n ) = H n _i re Inn + 7 
, which again differs from the mean walk length of the 



full model [15J, |16| (quoted above) only by a small shift 
in the constant term. We will show below that the close 
agreement between the Gillespie approximation and the 
full model extends to general fitness distributions, and 
provide a qualitative explanation for this behavior. 
General fitness distributions. We now turn to the ap- 
proximate evaluation of the absorption time ((4]) for the 
other EVT classes. As a representative of the Frechet 
class we choose the Pareto distribution p(F) — , 
F > 1 , which is a shifted and rescaled version of ([2]) with 
fj, = 1/k. A straightforward calculation shows that the 
expected value of the kth out of N fitness values is given 
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by 



(Fk) 



r(7V + i)r(fe-i) 



r(iv + 1 



£)r(*) 



(6) 



for N > k > 1. 



To estimate the fitness gap we take 



the derivative with respect to k [27| . (A&) w — ^(-Ffc) ~ 
./V f fc f . Approximating the sum in (J5J) by an integral 
we then find \ ~ N~^~~ , and hence A;/Aj ~ 
Inserting this into ((4]) and replacing sums by integrals 
we see that the first sum converges to a constant for 
n — ^ oo, while the second, double sum diverges loga- 



rithmically as 

(tn> « (l - 

© with k 



2/i-l 



Inn. Thus to leading order we find 



2^-1 

= l/u 



In n 



2/i-l 



Inn, which is identical to 



The calculation for the Weibull class of distributions 
with bounded support is similar. We consider distribu- 
tions on the unit interval of the form p(F) = (y + 1)(1 — 
F) v with v > — 1, corresponding to 1(3} with k = —-^tj- 

The mean of the fcth out of N values drawn from this dis- 

i 

tribution is given by (F k ) » 1 - (jf) " +1 for AT > fc » 1, 
and along the same lines of reasoning used previously we 
find that X,/Xj ~ (i/j) 7JTT - Again, this implies that the 
first sum on the right hand side of (UJ) converges, whereas 
the second double sum diverges logarithmically, leading 



1 - 



2u+3 

The result 



Inn = 



2//- 



■ In? 



^ Inn, in agree- 
for the uniform 



finally to (t n ) 

ment with (J3|) 
distribution (v — 0) was also obtained in .23]. 
Simulations. Next we compare the prediction ([3]) to sim- 
ulations, using both the full adaptive walk model and the 
simplified Gillespie model in a fixed mutational neighbor- 
hood. In the simulations of the full model, we avoided 
an explicit representation of the genotype space by creat- 
ing the fitness values encountered during the walk 'on the 
fly'. This ignores the possibility of the same genotype be- 
ing encountered more than once during the walk, which 
is however negligible for large N 16]. The total size of 
the neighborhood was N = 4000 in all cases, the starting 
rank was varied from n = 2 2 = 4ton = 2 u = 2048 
in factors of 2, and results were averaged over 1000 in- 
dependent realizations. As can be seen in Fig. [21 the 
asymptotic prediction ([3]) is well satisfied in both kinds 
of simulations. 

To rationalize the observed close agreement between 
the Gillespie approximation and the full adaptive walk, 
we analyze the effect that the two processes involved in 
a single step of the walk have on the rank of the current 
genotype (Fig. [lj. In the first process, the choice of a fit- 
ter neighbor according to the transition probability Pij, 
the rank of the genotype changes by an amount that is 
proportional to the initial rank; to be specific, the ex- 
pected new rank j conditioned on the original rank i is 
given by (j) = jfii for i ^> 1 [2fj. The change of rank 
due to the subsequent change of the mutational neighbor- 
hood (which is omitted in the Gillespie approximation) 
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FIG. 2. (Color online) Simulation results for the full adap- 
tive walk model (full symbols and lines) and the Gillespie 
approximation (open symbols and dashed lines). Slopes of 
lines are given by Q and intercepts have been fitted to the 
numerical data, (a) Frechet class with fi = — = y-, 2 and 5. 
The fitted intercepts are c K = c 7 / 10 = 1.60, Ci/ 2 = 1.39 and 
c 1( / 5 = 1.25 for the full model and £7/10 = 1.27, C1/2 = 1.00, 
C1/5 = 0.84 for the Gillespie approximation, (b) Weibull class 
with v = -(1 + i) = -0.75, -0.5 and 0.5. Fitted inter- 



cepts are c_ 2 /3 



1.18, c. 



2/3 



0.66, c_ 



1.12, 



C-2 



0.61, 



c_4 = 1-00 and c_4 = 0.56 . In all cases c K > c K 



can be deduced from the classic analysis of the number 
of exceedances [2^, [29|, which shows that the expected 
new rank j' conditioned on the old rank j is j + 1 , with a 
variance of order j. Thus for i,j 3> 1 the change in rank 
due to the change in neighborhood is a small perturba- 
tion (of relative size -^*) of the change that occurs in the 
first process, which explains the quantitative accuracy of 
the Gillespie approximation. The fact that the change of 
neighborhood on average increases the rank is consistent 
with the numerical observation that the adaptive walks 
in the full model are always slightly longer than in the 
Gillespie approximation (Fig. [2J . 

Relation to quasispecies models. The quasispecies ap- 
proach to evolution assumes very large populations, 
MU — > 00, such that demographic fluctuations are ab- 
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sent and the adaptive process is completely determin- 
istic [3(J. In an uncorrected random fitness landscape 
the most populated genotype then performs a kind of 
'adaptive flight', which is essentially constrained to move 
between local fitness maxima and terminates only when 
the global fitness maximum is reached 2J, |25[. In the 
simple case of a one- dimensional genotype space, the 
length of such an adaptive flight depends logarithmically 
on the number of genotypes with a prefactor given pre- 
cisely by the expression in ([3]), a behavior that was first 
observed numerically [24{ and subsequently derived ana- 
lytically in 26] . The formal relation to the adaptive walk 
problem can be traced back to the fact that the transi- 
tion probability of the adaptive flight, which describes 
the rate at which the most populated genotype jumps 
from one fitness peak to the next, depends linearly on 
the fitness difference between the two peaks in the same 
way as the fixation probability (QJ [26]. This structure 
also appears in the analysis of the collision statistics of 
a one-dimensional gas with quenched random velocities 

Employing a com plet ely different mathematical ap- 
proach, Sire et al. [26j computed the mean length of 
the adaptive flights as well as the corresponding variance 
(see also [3l| ) . Using their result one finds that the index 
of dispersion I (defined as the ratio of the variance to the 
mean) depends on the EVT parameter k according to the 

simple expression / = ^4^~^ , which takes its minimal 
value / = | for the Gumbel class (k — 0) and approaches 
unity for k — > — oo as well as for k — > 1. This formula 
reproduces the results obtained in [23| for k = and 
k = — 1 , and we have checked numerically that it applies 
to the full adaptive walks problem for general k. Thus, 
while the walk length has a Poisson distribution in the 
case of random dynamics [l6| , in general the fluctuations 
are sub-Poissonian. 

Conclusions. We have analyzed a simple, paradigmatic 
model for the evolution of populations subject to rare mu- 
tations and strong selection, and derived a precise asymp- 
totic relation between the length of adaptive walks and 
the tail of the underlying fitness distribution. While the 
predicted asymptotics may be difficult to observe in ex- 
periments, the EVT shape parameter k can be estimated 
experimentally 11911 , and examples with k = [32| , k < 
[33| and k > [34| have been identified. 

An important restriction of our model is the assump- 
tion that fitness values of different genotypes are uncorre- 
cted. Indeed, a recent study comparing the distributions 
of beneficial fitness effects encountered during the first 
and second steps of an adaptive walk found strong evi- 
dence for fitness correlations between neighboring geno- 
types [loj . Such correlations are likely to significantly 
affect the results presented here, and will be addressed 
in the future. 
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